Rule Induction for Sentence Reduction

نویسندگان

  • João Cordeiro
  • Gaël Dias
  • Pavel Brazdil
چکیده

The field of Automatic Sentence Reduction has been an active research topic, with several relevant approaches being recently proposed. However, in our view many milestones still need to be reached in order to approach human-like quality sentence simplification. In this work, we propose a new framework, which processes huge sets of web news stories and learns sentence reduction rules in a fully automated and unsupervised way. This is our main contribution. Our system is conceptually composed of several modules. In the first one, the system automatically extracts paraphrases from on-line news stories, using new lexically based functions that we have proposed. In our system's second module, the extracted paraphrases are transformed into aligned paraphrases, meaning that the two paraphrasic sentences get their words aligned through DNA-like sequence alignment algorithms, that has been conveniently adapted for aligning sequences of words. These alignments are then explored and specific text structures called bubbles are selected. Afterwards, these structures are transformed into learning instances and used in the last learning module that exploits techniques of Inductive Logic Programming. This module learns the rules for sentence reduction. Results show that this is a good approach for learning automatic sentence reduction, while some pertinent issues still need future investigation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Maximum Pattern Matching with Rule Induction Approach for Sentence Parsing

Chinese parsing has been a highly active research area in recent years. This paper describes a hierarchical maximum pattern matching to integrate rule induction approach for sentence parsing on traditional Chinese parsing task. We have analyzed and extracted statistical POS (part-of-speech) tagging information from training corpus, then used the related information for labeling unknown words in...

متن کامل

Suffering from Illness and Euthanasia sentence

Perhaps, the most appropriate translation proposed for euthanasia is the painless and piteous killing. According to the existence of effective components in committing a crime, it is considered as complicity in murder and the consent of victim does not affect the nature of criminal act and the criminal liability of person depriving the life. One of issues related to this killing which is disagr...

متن کامل

Design and Implementation of an Intelligent Part of Speech Generator

The aim of this paper is to report on an attempt to design and implement an intelligent system capable of generating the correct part of speech for a given sentence while the sentence is totally new to the system and not stored in any database available to the system. It follows the same steps a normal individual does to provide the correct parts of speech using a natural language processor. It...

متن کامل

Example-Based Sentence Reduction Using Hidden Markov Model

Sentence reduction is the problem of removing redundant words or phrases from an input sentence by creating a new sentence, in which the gist of the meaning of the original sentence is unchanged. All most previous methods required a syntax parser before reducing sentence. However, these methods were difficult to apply to a language in which there was not a reliable parser. In this paper, we pro...

متن کامل

Composition and Decomposition of Japanese Katakana and Kanji Morphemes for Decision Rule Induction from Patent Documents

We propose a new method to construct a word list for rule induction from Japanese patent documents. For word segmentation in Japanese, statistical morphological analyzers have been used in many applications. However, the output of these morphological analyzers presents defects when analyzing unknown words, specifically words that contain Kanji/Katakana morphemes. Some words are overly segmented...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013